refactor(proxy): LongCat immediate provider exclusion, non-LongCat model-only skip on 5xx by vi70x3 · Pull Request #7 · animaios/api-llm-localhost

vi70x3 · 2026-06-02T00:02:40Z

Summary

Replaces the previous recordConsecutiveFailure mechanism (which banned entire providers after 2 consecutive 5xx for ALL providers) with differentiated failure handling:

LongCat

Any failure (5xx, truncation, retryable error) → exclude the entire LongCat provider immediately for the session
Uses banPlatformFromSession() + addProviderModelsToSkipModels() to block all LongCat models
Clears sticky preference if pinned to LongCat

Non-LongCat providers

5xx errors → skip only the specific model that failed (skipModels.add(route.modelDbId)), other models from the same provider remain available
429 rate-limit errors → cooldown the specific key (setCooldown) but allow other keys for the same model to be tried
Truncation → skip only the specific model that produced truncated output

Changes in `server/src/routes/proxy.ts`

Mid-stream 5xx detection: LongCat → full provider exclusion, others → model-only skip
Pre-stream 5xx detection: same differentiation
Post-stream truncation: same differentiation
Mid-stream truncation: same differentiation
Retryable error block: LongCat → full provider exclusion, others → key/model-level handling preserved

All 156 existing tests pass. TypeScript compiles cleanly.

Summary by CodeRabbit

New Features
- Added automatic session-level provider bans triggered by consecutive errors and incomplete responses.
- Implemented intelligent fallback routing that automatically switches to alternative providers when the current provider is temporarily unavailable for a session.
- Added cooldown safeguard to prevent rapid re-routing for sticky sessions.
- Enhanced truncation detection to identify and handle incomplete API responses.
Tests
- Added comprehensive test coverage for session ban behavior, fallback routing, and provider error handling.

Implements the LongCat session ban feature in proxy.ts. When LongCat detects multiple API key use (auth/rate-limit errors) or returns truncated responses, the platform is banned from the sticky session. Future requests in that session route to non-LongCat models. Changes: - Extend stickySessionMap with bannedPlatforms?: Set<string> - Add isSessionBannedFromPlatform() to check session bans - Add banPlatformFromSession() to record platform bans - Add addLongcatModelsToSkipModels() to skip all LongCat models - Add isTruncatedResponse() to detect truncation keywords - Update getStickyKey() to return undefined for banned platforms - Update setStickyModel() to preserve bannedPlatforms across updates - Update pre-routing logic to check bans before routing - Update error handling to ban LongCat on auth/rate-limit/truncation - Add truncation detection after stream completes - Add truncation detection in mid-stream error handling

… consecutive failures - Extend stickySessionMap with consecutiveFailures tracking per provider - Add recordConsecutiveFailure(), resetConsecutiveFailures(), resetAllConsecutiveFailures() - Replace addLongcatModelsToSkipModels with generic addProviderModelsToSkipModels - Replace LongCat-specific auth/rate-limit ban with general 5xx consecutive failure detection (threshold: 2) - Generalize truncation detection to all providers (post-stream + mid-stream) - Update getStickyKey() to check bannedPlatforms for any platform - Update pre-routing ban check to be generic (any banned platform) - Add success path counter reset on both streaming and non-streaming paths - Remove LongCat-specific auth error ban, rate limit ban, and addLongcatModelsToSkipModels - Rename and rewrite tests from longcat-session-ban to provider-session-ban (32 test cases) - TypeScript compiles cleanly, all 150 tests pass

- Add isBanEligibleStatus() helper restricting to {500,502,503,504} - Improve mid-stream truncation detection with aggregated error sources - Pre-routing ban check now skips ALL banned platforms, not just preferredModel's - Only clear preferredModel when provider is actually banned (not on first 5xx) - Handle Error objects in isTruncatedResponse (instanceof check before JSON.stringify)

…or other sessions during 3-min cooldown window

…del-only skip on 5xx

coderabbitai · 2026-06-02T00:02:51Z

📝 Walkthrough

Walkthrough

This PR introduces session-scoped platform banning and a sticky-session cooldown mechanism to the proxy layer. When requests encounter consecutive provider failures or truncation, the session bans that platform and routes subsequent requests through fallback models. Sticky sessions persist through cooldown windows without routing to LongCat, while the router preserves preferred models even when they appear in skip lists.

Changes

Session Banning & Cooldown Integration

Layer / File(s)	Summary
Session Ban & Cooldown Specifications `.roo/specs/longcat-session-ban/`, `.roo/specs/longcat-sticky-cooldown/`, `.roo/specs/provider-5xx-session-ban/*`	Design, requirements, and tasks documents specify session banning triggered by LongCat errors, consecutive 5xx responses, or truncation; cooldown behavior; and provider-wide generalization with TTL-scoped persistence.
Session Ban Infrastructure & Helpers `server/src/routes/proxy.ts` (constants, type extension, helper functions)	`stickySessionMap` entries now track `bannedPlatforms: Set<string>` and `consecutiveFailures: Map<string, number>`. New exported helpers detect bans, record/reset consecutive failures, expand banned platforms into skip-model sets, and identify truncated responses.
Sticky Session Routing with Ban & Cooldown Integration `server/src/routes/proxy.ts` (sticky routing logic, `setStickyModel`), `server/src/services/router.ts` (skip-model filtering)	Sticky routing checks session bans and applies LongCat cooldown by updating `skipModels` while preserving sticky preferences. `setStickyModel` preserves ban state across model changes. Router change allows preferred models to bypass `skipModels` filtering for sticky-session pinning.
Error Detection & State Recovery for Streaming & Non-Stream `server/src/routes/proxy.ts` (stream completion, mid-stream error, non-stream retry, success paths)	Error handlers detect 5xx and truncation patterns, trigger ban recording or consecutive-failure tracking with automatic bans on threshold, and differentiate LongCat (provider-level bans) from non-LongCat (model-level skips). Success paths reset consecutive-failure counters. Mid-stream truncation gracefully completes streams without retry.
Session Ban Helper Tests `server/src/__tests__/routes/provider-session-ban.test.ts`	Comprehensive unit and integration tests for `isSessionBannedFromPlatform`, `banPlatformFromSession`, `recordConsecutiveFailure`, failure/ban counters, truncation detection, and lifecycle behavior (bans persist across model changes, counters reset on success, truncation triggers bans).
Cooldown Routing Behavior Tests `server/src/__tests__/routes/proxy-tools.test.ts`	New test suite validating LongCat cooldown exclusion for other sessions while preserving sticky routing, cooldown expiry, non-application to non-LongCat stickies, and ban precedence over cooldown.

Sequence Diagram

sequenceDiagram
  participant Client
  participant ProxyHandler
  participant StickySessionMap
  participant Router
  participant Provider
  Client->>ProxyHandler: POST /chat/completions (sticky session)
  ProxyHandler->>StickySessionMap: check sticky session + bans
  alt session banned from platform
    ProxyHandler->>ProxyHandler: add banned models to skipModels
  end
  alt cooldown active for sticky LongCat
    ProxyHandler->>ProxyHandler: add LongCat models to skipModels (other sessions only)
  end
  ProxyHandler->>Router: routeRequest with skipModels
  Router->>Router: exclude skipModels unless preferred sticky model
  Router-->>ProxyHandler: selected model
  ProxyHandler->>Provider: forward request
  Provider-->>ProxyHandler: 5xx error or truncated response
  ProxyHandler->>ProxyHandler: detect ban-eligible status
  alt 5xx or truncation
    ProxyHandler->>StickySessionMap: recordConsecutiveFailure()
    alt threshold reached
      StickySessionMap->>StickySessionMap: ban platform + update skipModels
    end
  end
  alt retry available
    ProxyHandler->>Router: routeRequest with updated skipModels
    Note over Router: LongCat now excluded, fall back to alternate
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

vi70x3/freellmapi#2: Extends sticky-session infrastructure with keyId/preferredKeyId handling that this PR builds upon to add bannedPlatforms and route-rejection via skipModels for banned sessions.

Poem

A rabbit hops through sticky schemes,
Banning paths when errors gleam,
When providers fail, the fallback flows,
While cooldowns keep the pinned one close—
Ban and retry, retry and win! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 5.88% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly and specifically describes the main behavioral change: LongCat receives immediate provider exclusion on failures, while non-LongCat providers skip only the specific model on 5xx errors.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch spec/longcat-sticky-cooldown

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request implements a session ban and fallback mechanism for LongCat and other providers, including a cooldown safeguard and generalized truncation/5xx error handling. The review feedback highlights a critical logic error in the cooldown safeguard where the active session skips itself while other sessions bypass the cooldown entirely. Additionally, the bypass added to the router's skipModels check breaks the fallback mechanism for sticky sessions during retryable errors. Finally, several newly added consecutive failure tracking functions are identified as dead code because the refactored error handler performs direct banning and skipping.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-02T00:04:36Z

+  // LongCat sticky cooldown: if the sticky model is on LongCat and was used
+  // within the last 3 minutes, exclude LongCat from the bandit router for all
+  // other sessions. The current sticky session keeps its pinned LongCat route.
+  // This prevents LongCat from seeing multiple sessions/keys from the same IP.
+  if (preferredModel) {
+    const db = getDb();
+    const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
+    if (prefRow?.platform === 'longcat') {
+      const cooldownSessionKey = getSessionKey(normalizedMessages, routingMode);
+      const cooldownEntry = cooldownSessionKey ? stickySessionMap.get(cooldownSessionKey) : undefined;
+      if (cooldownEntry && Date.now() - cooldownEntry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) {
+        const ageMs = Date.now() - cooldownEntry.lastUsed;
+        addProviderModelsToSkipModels(skipModels, 'longcat');
+        console.log(`[Sticky] LongCat cooldown active — excluding LongCat from bandit routing for other sessions | session=${cooldownSessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`);
      }
    }
  }


🚨 Critical Logic Error in Cooldown Safeguard

There is a fundamental logic error in how the LongCat cooldown safeguard is implemented. The goal of the cooldown is to prevent other sessions from routing to LongCat when a sticky LongCat session is active (hot). However, the current implementation does the exact opposite:

For the active/hot session itself: preferredModel is set to LongCat, so it enters this block, retrieves its own session entry, and adds all LongCat models to its own skipModels set. It then relies on a router-level bypass (entry.model_db_id !== preferredModelDbId) to avoid skipping itself.

For other sessions: When a different session (or a new session) sends a request, its preferredModel is either undefined or a different model. Therefore, it never enters this block, never adds LongCat to its skipModels, and is free to route to LongCat via the bandit router.

This means the cooldown safeguard is completely ineffective at protecting LongCat from concurrent sessions, while introducing unnecessary complexity and bypasses for the active session.

Suggested Fix

To correctly implement the cooldown, the proxy should check if any session in the global stickySessionMap is currently "hot" on LongCat, and if so, add LongCat to the skipModels of the current request only if the current request is not already sticky-pinned to that LongCat model.

// LongCat sticky cooldown: if any OTHER session has used LongCat recently, // exclude LongCat from the bandit router for the current session. const currentSessionKey = getSessionKey(normalizedMessages, routingMode); const now = Date.now(); let isAnyOtherLongcatHot = false; for (const [key, entry] of stickySessionMap) { if (key === currentSessionKey) continue; if (now - entry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) { const db = getDb(); const row = db.prepare('SELECT platform FROM models WHERE id = ?').get(entry.modelDbId) as { platform: string } | undefined; if (row?.platform === 'longcat') { isAnyOtherLongcatHot = true; break; } } } if (isAnyOtherLongcatHot) { addProviderModelsToSkipModels(skipModels, 'longcat'); console.log(`[Sticky] LongCat cooldown active due to another active session — excluding LongCat from bandit routing`); }

gemini-code-assist · 2026-06-02T00:04:36Z


  for (const entry of sorted) {
-    if (skipModels?.has(entry.model_db_id)) continue;
+    if (skipModels?.has(entry.model_db_id) && entry.model_db_id !== preferredModelDbId) continue;


⚠️ Fallback Mechanism Broken for Sticky Sessions

The bypass && entry.model_db_id !== preferredModelDbId added to the skipModels check breaks the fallback mechanism for sticky sessions when their pinned model encounters a retryable error (such as a 5xx or timeout).

How it breaks:

When a sticky model fails with a retryable error, it is added to skipModels (e.g., at line 1555 or 1579 in proxy.ts) so that the retry loop can fall back to a different model.

On the next retry attempt, routeRequest is called with the same preferredModelDbId and the updated skipModels set.

Because of && entry.model_db_id !== preferredModelDbId, the router ignores the fact that the preferred model is in skipModels and routes to it again.

This forces the proxy to repeatedly retry the failing preferred model (or its other keys) up to MAX_RETRIES (20 times), instead of falling back to another provider/model immediately.

Solution:

By fixing the cooldown safeguard logic in proxy.ts (so that the active session never adds its own preferred model to skipModels), we can completely remove this bypass from the router. This allows the router to correctly skip the preferred model if it fails during the retry loop.

Suggested change

if (skipModels?.has(entry.model_db_id) && entry.model_db_id !== preferredModelDbId) continue;

if (skipModels?.has(entry.model_db_id)) continue;

gemini-code-assist · 2026-06-02T00:04:36Z

+function recordConsecutiveFailure(
+  messages: ChatMessage[],
+  routingMode: RoutingMode,
+  provider: string,
+  skipModels: Set<number>,
+  modelDbId?: number,
+): void {
+  const key = getSessionKey(messages, routingMode);
+  if (!key) return;
+  let entry = stickySessionMap.get(key);
+  if (!entry) {
+    if (modelDbId === undefined) return;
+    entry = { modelDbId, lastUsed: Date.now() };
+  }
+  if (!entry.consecutiveFailures) entry.consecutiveFailures = new Map();
+  const count = (entry.consecutiveFailures.get(provider) ?? 0) + 1;
+  entry.consecutiveFailures.set(provider, count);
+  entry.lastUsed = Date.now();
+  stickySessionMap.set(key, entry);
+  console.log(`[Sticky] consecutive 5xx for ${provider}: ${count}/2 session=${key.slice(0, 8)}`);
+  if (count >= 2) {
+    if (!entry.bannedPlatforms) entry.bannedPlatforms = new Set();
+    entry.bannedPlatforms.add(provider);
+    addProviderModelsToSkipModels(skipModels, provider);
+    entry.consecutiveFailures.delete(provider);
+    entry.lastUsed = Date.now();
+    stickySessionMap.set(key, entry);
+    console.log(`[Sticky] banned platform=${provider} for session=${key.slice(0, 8)} | bannedPlatforms=${Array.from(entry.bannedPlatforms).join(',')}`);
+  }
+}


🧹 Dead Code: Unused Consecutive Failure Tracking

The functions recordConsecutiveFailure, resetConsecutiveFailures, and resetAllConsecutiveFailures (along with the consecutiveFailures field in stickySessionMap's value type) are now completely unused dead code.

The refactored error handling in handleChatCompletion directly bans LongCat or skips non-LongCat models on 5xx/truncation errors immediately, without using consecutive failure counters.

To keep the codebase clean and maintainable, please remove these unused functions, their exports, and the consecutiveFailures field from the stickySessionMap type definition.

(Note: You will also need to update the newly added provider-session-ban.test.ts to remove tests for these deleted functions, and instead add tests for the actual request-level banning/skipping behavior.)

coderabbitai

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

server/src/routes/proxy.ts (1)
1412-1495: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Mid-stream LongCat retryable errors still escape without a session ban.

This catch only bans LongCat on 5xx or truncation. A mid-stream timeout, 429, auth failure, or connection reset will go down the generic response.failed path and leave LongCat eligible on the next request, which breaks the “ban LongCat on any retryable failure” contract.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/src/routes/proxy.ts` around lines 1412 - 1495, The handler currently
only bans LongCat for 5xx (via getErrorStatus/isBanEligibleStatus) or truncation
(isTruncatedResponse), allowing mid-stream timeout/429/auth/connection-reset
errors to fall through to the generic response.failed path; add a reusable check
(e.g., isRetryableStreamError(streamErr)) that detects retryable mid-stream
failures (status 429, auth failures, timeouts, ECONNRESET, network errors,
etc.), call it alongside the 5xx/truncation checks, and if route.platform ===
'longcat' perform the same ban actions
(banPlatformFromSession(normalizedMessages, routingMode, 'longcat',
route.modelDbId) and addProviderModelsToSkipModels(skipModels, 'longcat'))
before ending the stream and logging so LongCat is excluded consistently on any
retryable mid-stream failure.

🧹 Nitpick comments (1)

.roo/specs/longcat-sticky-cooldown/tasks.md (1)

6-8: 💤 Low value

Clean up markdown emphasis markers.

Tasks 2 and 4 have spaces inside emphasis markers (e.g., ** Replace** and ** Update**), which triggers markdown linting warnings. While functional, removing these spaces improves document quality.

✨ Suggested cleanup

-- [x] **Replace** existing cooldown check logic in [`handleChatCompletion()`](server/src/routes/proxy.ts:1243) — instead of clearing `preferredModel`/`preferredKeyId`, add LongCat models to `skipModels` via `addProviderModelsToSkipModels(skipModels, 'longcat')`. Keep `preferredModel`/`preferredKeyId` intact.
-- [x] **Modify** `skipModels` check in [`server/src/services/router.ts`](server/src/services/router.ts:539) to exclude the sticky model: `if (skipModels?.has(entry.model_db_id) && entry.model_db_id !== preferredModelDbId) continue;`
-- [x] **Update** unit tests in [`server/src/__tests__/routes/proxy-tools.test.ts`](server/src/__tests__/routes/proxy-tools.test.ts) to match new behavior: cooldown adds LongCat to skipModels instead of clearing sticky preference
+- [x] **Replace** existing cooldown check logic in [`handleChatCompletion()`](server/src/routes/proxy.ts:1243) — instead of clearing `preferredModel`/`preferredKeyId`, add LongCat models to `skipModels` via `addProviderModelsToSkipModels(skipModels, 'longcat')`. Keep `preferredModel`/`preferredKeyId` intact.
+- [x] **Modify** `skipModels` check in [`server/src/services/router.ts`](server/src/services/router.ts:539) to exclude the sticky model: `if (skipModels?.has(entry.model_db_id) && entry.model_db_id !== preferredModelDbId) continue;`
+- [x] **Update** unit tests in [`server/src/__tests__/routes/proxy-tools.test.ts`](server/src/__tests__/routes/proxy-tools.test.ts) to match new behavior: cooldown adds LongCat to skipModels instead of clearing sticky preference

As per coding guidelines, these are markdown formatting improvements flagged by markdownlint-cli2 (MD037).

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.roo/specs/longcat-sticky-cooldown/tasks.md around lines 6 - 8, Cleanup the
markdown emphasis markers in the checklist lines by removing the extra spaces
inside bold markers so they read "**Replace**" and "**Update**" instead of "**
Replace**" and "** Update**"; locate the two checklist items referencing
replacement of cooldown logic and updating unit tests (the tasks labeled 2 and
4) and edit their text to remove the inner-space in the bold delimiters to
satisfy markdownlint MD037 while keeping the rest of the sentences unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.roo/specs/longcat-session-ban/design.md:
- Line 85: In the sentence starting "Records a platform ban in the sticky
session. Called when truncation, auth, or rate-limit errors is detected on
LongCat.", correct the subject-verb agreement by changing "is" to "are" so it
reads "Called when truncation, auth, or rate-limit errors are detected on
LongCat." Use the same surrounding text ("Records a platform ban in the sticky
session" / "LongCat") to locate the line in the design.md.
- Line 5: The doc is contradictory about router changes: update the sentence
that mentions "the router requires minimal changes — only the LongCat smart-auto
preference needs to respect session bans" to instead state that no changes to
router.ts are required; keep the rest as-is and explicitly note that all ban
detection and session management is implemented in proxy.ts and integrated into
handleChatCompletion(), and reference LongCat smart-auto as unaffected so
readers know router.ts does not need edits.

In @.roo/specs/longcat-session-ban/requirements.md:
- Around line 11-15: The Context section has multiple formatting and grammar
issues—remove the stray period and fix capitalization in the opening sentence
(replace ". and It uses" with "and it uses"), normalize the inline code snippet
that currently reads "`{ modelDbId, + optional `keyId` + `lastUsed` }`" to a
clear code span like `{ modelDbId, optional keyId, lastUsed }`, delete or merge
the orphan fragments ("and eviction." and the standalone "and"), break up and
rewrite the run-on paragraph so each idea is a clear sentence referencing the
sticky session implementation in server/src/routes/proxy.ts and the
longcat-sticky-key spec, and ensure references to functions/files
(clearStickyKey(), preferredModel, routeRequest(), shouldSkipModelOnRetry(),
isRetryableError()) remain correct and readable.
- Around line 5-7: Fix the grammar in the Overview sentence and spacing in the
numbered list: change "the system must to:" to "the system must:" in the
sentence starting "When a LongCat sticky session encounters an error — whether
it's an auth failure (401/40403) or a 'truncated' / 'conflict' error from the
provider — the system must to:" and ensure there are spaces after commas in the
numbered list items (e.g., "1. Detect the error, 2. Ban..." → "1. Detect the
error, 2. Ban..." with proper spacing after each comma) so the sentence reads
correctly and the list is properly spaced.
- Around line 24-27: The table in requirements.md has duplicated row number
"11"; update the second occurrence (the row listing `server/src/routes/proxy.ts`
with range 1036-1053) to "12" so row numbers are unique and sequential; ensure
any subsequent rows are renumbered accordingly if needed.

In @.roo/specs/longcat-session-ban/tasks.md:
- Around line 15-16: Task 2 in .roo/specs/longcat-session-ban/tasks.md contains
a duplicated list item "Add diagnostic logging"; remove the redundant duplicate
so only a single "Add diagnostic logging" entry remains for task 2 and ensure
the surrounding list/numbering (the task 2 bullet or numbered list) remains
consistent.

In @.roo/specs/longcat-sticky-cooldown/requirements.md:
- Around line 74-76: The requirement "No Router Changes" is incorrect because
the router's skipModels logic must allow a sticky session's preferred model;
update the router code handling skipModels in server/src/services/router.ts by
modifying the check to skip models only when entry.model_db_id is in skipModels
AND is not equal to the session's preferredModelDbId (i.e., change the
conditional to effectively: if (skipModels?.has(entry.model_db_id) &&
entry.model_db_id !== preferredModelDbId) continue), ensuring the router honors
sticky sessions while still skipping cooldowned models.

In @.roo/specs/provider-5xx-session-ban/design.md:
- Around line 1-10: The spec is out of sync with proxy.ts: update the design
text to describe the runtime behavior where LongCat providers are excluded
immediately on any 5xx or on detected truncation, non-LongCat providers are
skipped at the model level via skipModels.add(route.modelDbId) (no production
2-consecutive-failure threshold), truncation is detected by
isTruncatedResponse(...) using keyword heuristics (e.g., "truncated",
"context_length_exceeded", "token_limit"), and bannedPlatforms /
recordConsecutiveFailure are currently test-only exports; either amend the
document to match this immediate-exclusion behavior or move the older
2-consecutive-5xx design into archive/alternatives with a note that
recordConsecutiveFailure and consecutiveFailures are test-only.

In @.roo/specs/provider-5xx-session-ban/requirements.md:
- Around line 5-12: The requirements claim that both "5xx consecutive failure
ban" and "truncation detection ban" apply to all providers, but the proxy
implementation only calls banPlatformFromSession()/bannedPlatforms for
route.platform === 'longcat' while non-LongCat errors use
skipModels.add(route.modelDbId); update the implementation so both triggers use
banPlatformFromSession() and the bannedPlatforms/session ban flow for all
providers (remove the longcat-only guard around banPlatformFromSession and
ensure truncation and 5xx handling paths call banPlatformFromSession() and/or
update bannedPlatforms consistently), and keep skipModels behavior for
model-level skipping where appropriate (ensure functions referenced:
banPlatformFromSession, bannedPlatforms, skipModels, and the 5xx/truncation
handling code paths).

In @.roo/specs/provider-5xx-session-ban/tasks.md:
- Around line 9-29: Update the task list to reflect the final LongCat/model-skip
design instead of the old "2 consecutive 5xx bans the provider" flow: remove or
replace the checked items that describe "increment consecutiveFailures and ban
at count >= 2" and instead describe the shipped behavior where a LongCat
response triggers immediate LongCat exclusion and non-LongCat model-only
skipping; adjust the entries that mention recordConsecutiveFailure(),
resetConsecutiveFailures(), resetAllConsecutiveFailures(),
banPlatformFromSession(), bannedPlatforms, addProviderModelsToSkipModels(), and
consecutiveFailures to document the new immediate exclusion and model-only skip
behavior (including parameters, sticky entry handling, TTL refresh, and logging
semantics) so follow-up work and tests target the current design.

In `@server/src/__tests__/routes/provider-session-ban.test.ts`:
- Around line 208-259: The tests under recordConsecutiveFailure still assert the
old behavior (platform-wide bans after two 5xxs and truncation-driven bans for
any provider); update the test cases (the ones using getSessionKey,
stickySessionMap, skipModels and recordConsecutiveFailure) to assert the new
semantics: for platform "longcat" expect an immediate platform ban on first
failure (stickySessionMap entry.bannedPlatforms contains 'longcat' after one
recordConsecutiveFailure) while for non-LongCat providers (e.g., 'groq') expect
only the specific model id to be added to skipModels (and
entry.consecutiveFailures incremented) but not a platform-wide ban; adjust the
related assertions in the other referenced blocks (lines 261-312, 394-430,
433-443) to match these rules and remove assumptions about two-failure bans or
truncation causing platform-wide bans.

In `@server/src/__tests__/routes/proxy-tools.test.ts`:
- Around line 895-973: The test never asserts which provider was chosen, so it
doesn't verify non-sticky-session exclusion; update the test (the "excludes
LongCat..." spec) to assert the mocked routing by checking the routedProvider
after the POST — e.g. expect(routedProvider).not.toBe('longcat') or
expect(routedProvider).toBe('groq') — and keep the existing session/key setup
that uses makeMessages and getSessionKey with stickySessionMap to ensure the
otherMessages represent a different session; ensure the mocked fetch spy
assigning routedProvider is in scope so the assertion observes the chosen
provider.
- Around line 840-849: Update the test POST calls to /api/keys to use the admin
credential instead of the unified/v1 helper; specifically, for each request(app,
'POST', '/api/keys', {...}) replace or augment the call so it supplies the admin
API key (the header used by admin-gated routes, e.g., Authorization: `Bearer
<adminKey>` or the test helper variant that attaches the admin key) so these
seeds exercise the /api/* admin gating (affects the calls shown and the other
occurrences at the ranges noted); keep the route and payload unchanged but
ensure auth is the admin key, not the v1/unified helper.

In `@server/src/routes/proxy.ts`:
- Around line 1243-1256: The cooldown check currently only runs when the current
request is already pinned (preferredModel), so it never excludes LongCat for
other sessions; instead, scan stickySessionMap for any sticky entry whose
platform is 'longcat' and whose lastUsed is within LONGCAT_STICKY_COOLDOWN_MS
and whose session key differs from the current session to trigger the exclusion.
Concretely, compute the current session key with
getSessionKey(normalizedMessages, routingMode), then iterate
stickySessionMap.values() (or entries) looking for an entry with entry.platform
=== 'longcat' and Date.now() - entry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS and
entry.sessionKey !== currentSessionKey; if found, call
addProviderModelsToSkipModels(skipModels, 'longcat') (and log) — do this
outside/independent of preferredModel so other sessions see LongCat excluded.

In `@server/src/services/router.ts`:
- Line 539: The current check lets preferredModelDbId bypass all skips; fix by
splitting skipSets into nonOverrideableSkips vs overrideableSkips and change the
routing condition so nonOverrideableSkips always win (e.g., if
(nonOverrideableSkips.has(entry.model_db_id)) continue; else if
(overrideableSkips.has(entry.model_db_id) && entry.model_db_id !==
preferredModelDbId) continue;), and update where skips are added (in proxy.ts)
to place hard failures into nonOverrideableSkips so sticky sessions cannot route
back to a hard-skipped model.

---

Outside diff comments:
In `@server/src/routes/proxy.ts`:
- Around line 1412-1495: The handler currently only bans LongCat for 5xx (via
getErrorStatus/isBanEligibleStatus) or truncation (isTruncatedResponse),
allowing mid-stream timeout/429/auth/connection-reset errors to fall through to
the generic response.failed path; add a reusable check (e.g.,
isRetryableStreamError(streamErr)) that detects retryable mid-stream failures
(status 429, auth failures, timeouts, ECONNRESET, network errors, etc.), call it
alongside the 5xx/truncation checks, and if route.platform === 'longcat' perform
the same ban actions (banPlatformFromSession(normalizedMessages, routingMode,
'longcat', route.modelDbId) and addProviderModelsToSkipModels(skipModels,
'longcat')) before ending the stream and logging so LongCat is excluded
consistently on any retryable mid-stream failure.

---

Nitpick comments:
In @.roo/specs/longcat-sticky-cooldown/tasks.md:
- Around line 6-8: Cleanup the markdown emphasis markers in the checklist lines
by removing the extra spaces inside bold markers so they read "**Replace**" and
"**Update**" instead of "** Replace**" and "** Update**"; locate the two
checklist items referencing replacement of cooldown logic and updating unit
tests (the tasks labeled 2 and 4) and edit their text to remove the inner-space
in the bold delimiters to satisfy markdownlint MD037 while keeping the rest of
the sentences unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 233973c3-d3bf-4112-bddc-808be094799e

📥 Commits

Reviewing files that changed from the base of the PR and between d9c5a73 and 95722bc.

📒 Files selected for processing (13)

.roo/specs/longcat-session-ban/design.md
.roo/specs/longcat-session-ban/requirements.md
.roo/specs/longcat-session-ban/tasks.md
.roo/specs/longcat-sticky-cooldown/design.md
.roo/specs/longcat-sticky-cooldown/requirements.md
.roo/specs/longcat-sticky-cooldown/tasks.md
.roo/specs/provider-5xx-session-ban/design.md
.roo/specs/provider-5xx-session-ban/requirements.md
.roo/specs/provider-5xx-session-ban/tasks.md
server/src/__tests__/routes/provider-session-ban.test.ts
server/src/__tests__/routes/proxy-tools.test.ts
server/src/routes/proxy.ts
server/src/services/router.ts

coderabbitai · 2026-06-02T00:10:30Z

+
+## Architecture Overview
+
+The ban mechanism extends the existing sticky session infrastructure in `proxy.ts` and integrates with the retry loop in `handleChatCompletion()`. The router (`router.ts`) requires minimal changes — only the LongCat smart-auto preference needs to respect session bans. All ban detection and session management happens in the proxy layer.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify router change requirements.

Line 5 states "the router requires minimal changes — only the LongCat smart-auto preference needs to respect session bans," but lines 374-380 conclude "No router changes needed." This contradiction creates ambiguity about whether router.ts modifications are in scope.

📝 Suggested clarification

Revise line 5 to align with the final decision:

-The ban mechanism extends the existing sticky session infrastructure in `proxy.ts` and integrates with the retry loop in `handleChatCompletion()`. The router (`router.ts`) requires minimal changes — only the LongCat smart-auto preference needs to respect session bans. All ban detection and session management happens in the proxy layer. +The ban mechanism extends the existing sticky session infrastructure in `proxy.ts` and integrates with the retry loop in `handleChatCompletion()`. The router (`router.ts`) requires no changes — the proxy passes `skipModels` containing all LongCat model IDs, and the router's existing skip logic handles the ban automatically. All ban detection and session management happens in the proxy layer.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

The ban mechanism extends the existing sticky session infrastructure in `proxy.ts` and integrates with the retry loop in `handleChatCompletion()`. The router (`router.ts`) requires minimal changes — only the LongCat smart-auto preference needs to respect session bans. All ban detection and session management happens in the proxy layer.

The ban mechanism extends the existing sticky session infrastructure in `proxy.ts` and integrates with the retry loop in `handleChatCompletion()`. The router (`router.ts`) requires no changes — the proxy passes `skipModels` containing all LongCat model IDs, and the router's existing skip logic handles the ban automatically. All ban detection and session management happens in the proxy layer.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/longcat-session-ban/design.md at line 5, The doc is contradictory about router changes: update the sentence that mentions "the router requires minimal changes — only the LongCat smart-auto preference needs to respect session bans" to instead state that no changes to router.ts are required; keep the rest as-is and explicitly note that all ban detection and session management is implemented in proxy.ts and integrated into handleChatCompletion(), and reference LongCat smart-auto as unaffected so readers know router.ts does not need edits.

coderabbitai · 2026-06-02T00:10:30Z

+
+### 2. `banPlatformFromSession()` — [`proxy.ts`](server/src/routes/proxy.ts)
+
+Records a platform ban in the sticky session. Called when truncation, auth, or rate-limit errors is detected on LongCat.


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix subject-verb agreement.

"truncation, auth, or rate-limit errors is detected" should use "are" for the plural subject.

✏️ Proposed fix

-Records a platform ban in the sticky session. Called when truncation, auth, or rate-limit errors is detected on LongCat. +Records a platform ban in the sticky session. Called when truncation, auth, or rate-limit errors are detected on LongCat.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Records a platform ban in the sticky session. Called when truncation, auth, or rate-limit errors is detected on LongCat.

Records a platform ban in the sticky session. Called when truncation, auth, or rate-limit errors are detected on LongCat.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/longcat-session-ban/design.md at line 85, In the sentence starting "Records a platform ban in the sticky session. Called when truncation, auth, or rate-limit errors is detected on LongCat.", correct the subject-verb agreement by changing "is" to "are" so it reads "Called when truncation, auth, or rate-limit errors are detected on LongCat." Use the same surrounding text ("Records a platform ban in the sticky session" / "LongCat") to locate the line in the design.md.

coderabbitai · 2026-06-02T00:10:30Z

+When a LongCat sticky session encounters an error — whether it's an auth failure (401/40403) or a "truncated" / "conflict" error from the provider — the system must to:
+
+1. Detect the error, 2. Ban the entire `longcat` platform for this sticky session, 3. Fall back to the next best non-LongCat model via normal routing, 4. Update the sticky session to point to the new fallback model, 5. Never route this session to LongCat again (until the session expires via TTL)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix grammar and formatting in Overview section.

Multiple issues:

Line 5: "must to:" should be "must:"

Line 7: Missing spaces after commas in the numbered list

✏️ Proposed fixes

-When a LongCat sticky session encounters an error — whether it's an auth failure (401/40403) or a "truncated" / "conflict" error from the provider — the system must to: +When a LongCat sticky session encounters an error — whether it's an auth failure (401/403) or a "truncated" / "conflict" error from the provider — the system must: -1. Detect the error, 2. Ban the entire `longcat` platform for this sticky session, 3. Fall back to the next best non-LongCat model via normal routing, 4. Update the sticky session to point to the new fallback model, 5. Never route this session to LongCat again (until the session expires via TTL) +1. Detect the error +2. Ban the entire `longcat` platform for this sticky session +3. Fall back to the next best non-LongCat model via normal routing +4. Update the sticky session to point to the new fallback model +5. Never route this session to LongCat again (until the session expires via TTL)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/longcat-session-ban/requirements.md around lines 5 - 7, Fix the grammar in the Overview sentence and spacing in the numbered list: change "the system must to:" to "the system must:" in the sentence starting "When a LongCat sticky session encounters an error — whether it's an auth failure (401/40403) or a 'truncated' / 'conflict' error from the provider — the system must to:" and ensure there are spaces after commas in the numbered list items (e.g., "1. Detect the error, 2. Ban..." → "1. Detect the error, 2. Ban..." with proper spacing after each comma) so the sentence reads correctly and the list is properly spaced.

coderabbitai · 2026-06-02T00:10:30Z

+ The existing sticky sessions feature lives in [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:13-112). and It uses an SHA-1 hash of `routingMode + firstUserMessage` to identify sessions, and stores `{ modelDbId, + optional `keyId` + `lastUsed` } with a 30-min TTL and 500-entry max.
+
+ and eviction.
+
+ The existing LongCat sticky key feature ([`longcat-sticky-key` spec](../../roo/specs/longcat-sticky-key/)) extends this to also prefer using the **same API key** within a session. For LongCat specifically, because LongCat benefits from session continuity at the key level. same key = same session context on their server side). The current behavior on auth errors (401/403) is to [`clearStickyKey()`](server/src/routes/proxy.ts:89-98) — which clears the sticky key but **keep the sticky model pinned to LongCat** via [`preferredModel`](server/src/routes/proxy.ts:1036-1037). On retry, [`routeRequest()`](server/src/services/router.ts:458) still has `preferredModel` pointing to LongCat, and tries **another LongCat key** via round-robin. and **LongCat detects different keys usage for the same session** → the "multiple API keys" problem. The [`shouldSkipModelOnRetry()`](server/src/routes/proxy.ts:430-432) function explicitly **does NOT skip the model** for auth errors or rate limit errors — so auth failures on LongCat result in key rotation within the same LongCat model, which is exactly what LongCat detects as "multiple API keys use." for the same session. Similarly, when LongCat returns a "truncated" or "conflict" error ( the provider truncates the response mid-stream, the current behavior is to silently switch to a different key and but the session continues on LongCat with a different key — same problem. The "truncated" error pattern is also detected by [`isRetryableError()`](server/src/routes/proxy.ts:409-4428) which checking for 429, 413, 400, 404, 408, 409, 422, 500, 502, 503, 504, andrate limit`, `quota`, `aborted`, `timeout`, `econnrefused`, `econnreset`, `unauthorized`, `forbidden`, `invalid api key`, `no longer available`, `model not found`, `bad request`, `invalid json payload`. TheisRetryableError()` function returns true for all these cases, meaning the proxy will retry with a different model/keykey. However, [`shouldSkipModelOnRetry()`](server/src/routes/proxy.ts:430-432) returns `true` only for rate-limit and auth errors — it does NOT skip the model. This means auth errors and LongCat result in key rotation within the same model, which is exactly the behavior LongCat detects as "multiple API keys use" for the same session. The existing [`clearStickyKey()`](server/src/routes/proxy.ts:89-98) only clears the sticky key but **keoes `preferredKeyId` to `undefined` — but the sticky model remains pinned to LongCat. On the next retry, [`routeRequest()`](server/src/services/router.ts:458) still receives `preferredModel` pointing to LongCat, and tries another LongCat key via round-robin. The LongCat smart-auto preference in [`router.ts`](server/src/services/router.ts:498-527) also means LongCat is still tried first in smart mode, so the retry will likely hit LongCat again. ## Functional Requirements ### FR-1: Detect Multiple Key Use on LongCat When a LongCat provider returns an error indicating that the same API key is been used for the same session ( the system must detect this condition. the error response contains language signaling multiple key use. a single session. Detection patterns: - Auth errors (401/403) — the current behavior already clears the sticky key but but tries another key on the same model - Rate-limit errors (429) — same pattern: key rotation within the same model - "Truncated" / "conflict" errors — LongCat truncates responses mid-stream when the response is shorter than expected, indicating the provider cut off the session. Detection keywords: "truncated", "truncation", "conflict", "length", "maximum length", "context_length_exceeded", "token_limit" - Any error message that the provider is complaining about session length or capacity limits for the current conversation. ### FR-2: Ban LongCat Platform for Sticky Session When FR-1 is triggered, the system must ban the entire `longcat` platform for the current sticky session. This means: - All LongCat model IDs must be added to `skipModels` in the retry loop - The sticky session must be updated to point to the new fallback model instead of LongCat - The session must never be routed to LongCat again for until the session expires via TTL (30 min) or The ban must persist across multiple retry attempts within the same request. ### FR-3: Fallback to Next Best Non-LongCat Model After banning LongCat, the retry loop must fall through to the next best available model via the existing Thompson Sampling / smart routing logic. The new model should be selected based on the normal scoring algorithm ( success rate + speed + TTFB + intelligence for smart mode, success rate + speed in balanced mode). ### FR-4: Update Sticky Session to New Model On successful fallback, the sticky session must be updated to point to the new fallback model and `modelDbId` + `keyId`. The sticky key feature should be cleared for the new model since since the fallback is not LongCat, since the sticky key preference only applies to LongCat sessions. ### FR-5: Never Route Session to LongCat Again Once a session is banned from LongCat, it must never be routed to LongCat again for the remainder of that session's lifetime (30 min TTL). This means: - The `stickySessionMap` entry must include a `bannedPlatforms` field (or `Set<string>`) to track which platforms are banned for this session - On subsequent requests in the same session, `getStickyModel()` returns the preferred model, but the proxy layer must check if the session is banned from LongCat and skip LongCat models before calling `routeRequest()` - The LongCat smart-auto preference in `router.ts` must also be suppressed for banned sessions ( the router should not boost LongCat entries to the front for sessions that are banned from LongCat ### FR-6: Truncated Response Detection When a LongCat streaming response is received, the proxy must check the response content for signs of truncation. If detected, the session must be banned from LongCat immediately, even if the stream has already started (headers sent, the client has already received partial data). The system must: - Log the truncation detection - Record the ban in the sticky session - Add all LongCat model IDs to `skipModels` - End stream and ban for future requests - The client receives the truncated partial response as-is; future requests in this session will route to non-LongCat models ### FR-7: Auth Error Handling for LongCat Sessions When an auth error (401/403) occurs on a LongCat sticky session: - Clear the sticky key via `clearStickyKey()` (existing behavior) - Additionally ban the LongCat platform for this session via the new `banPlatformFromSession()` function - Add all LongCat model IDs to `skipModels` - Set `preferredKeyId` to `undefined` - On retry, fall through to the next best non-LongCat model - Update sticky session to the new model on success ### FR-8: Rate-Limit Error Handling for LongCat Sessions When a rate-limit error (429) occurs on a LongCat sticky session: - Ban the LongCat platform for this session via `banPlatformFromSession()` - Add all LongCat model IDs to `skipModels` - Set `preferredKeyId` to `undefined` - On retry, fall through to the next best non-LongCat model - Update sticky session to the new model on success - Note: rate-limit errors on LongCat do NOT clear the entire sticky session (the session may still work with a different key on a different model). Only ban LongCat specifically. ### FR-9: Existing Behavior Preserved for Non-LongCat Sessions All existing sticky session behavior for non-LongCat providers must remain unchanged. The new ban mechanism only applies exclusively to LongCat sessions. Non-LongCat sessions that never have platform bans. ### FR-10: Session Expiry Clears Bans When a sticky session expires ( via TTL (30 min), the `bannedPlatforms` set is also cleared. This is natural — expired sessions are evicted from the `stickySessionMap` entirely, including all associated data. ### FR-11: No Database Schema Changes The ban mechanism is purely in-memory, using the existing `stickySessionMap`. No database schema changes are required. ### FR-12: Minimal Router Changes The router (`server/src/services/router.ts`) should not need significant changes. The only change is that the LongCat smart-auto preference logic should skip sessions that are banned from LongCat. The proxy layer handles all ban detection and session management. The router remains provider-agnostic. ### FR-13: No UI Changes This is a backend-only feature. No client-side changes are needed. ## Non-Functional Requirements ### NFR-1: Backward Compatibility Existing sessions without `bannedPlatforms` (from before this feature or for non-LongCat providers) must continue to work. The `bannedPlatforms` field must be optional in the sticky session map value type. ### NFR-2: Thread Safety The existing `stickySessionMap` is a plain `Map` with no locking ( single-threaded Node.js). The extended map follows the same pattern — no additional concurrency concerns. ### NFR-3: Minimal Performance Impact The ban check adds one `Set` lookup per one optional field check in the sticky session map entry per one DB query to check if a model is on a banned platform. No additional I/O beyond what already exists. ### NFR-4: Test Coverage New unit tests must cover: - Multiple key use detection (auth + rate limit + truncated) - Session ban persistence across retries - Fallback to next best model - Sticky session update on success - Session expiry clearing bans - Non-LongCat sessions unaffected ## Files Requiring Modification | # | File | Change Type | Description |


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clean up Context section formatting and grammar.

The Context section contains multiple formatting errors:

Extra period and incorrect capitalization: ". and It uses"

Malformed inline code with "+ optional"

Incomplete sentences: "and eviction." and standalone "and"

Run-on text that is difficult to parse

🧰 Tools

🪛 LanguageTool

[style] ~15-~15: Consider an alternative for the overused word “exactly”.
Context: ...within the same LongCat model, which is exactly what LongCat detects as "multiple API k...

(EXACTLY_PRECISELY)

[grammar] ~15-~15: Ensure spelling is correct
Context: ...404, 408, 409, 422, 500, 502, 503, 504, andrate limit, quota, aborted, timeout`, ...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[grammar] ~15-~15: Ensure spelling is correct
Context: ...proxy will retry with a different model/keykey. However, [shouldSkipModelOnRetry()](...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

[style] ~15-~15: Consider an alternative for the overused word “exactly”.
Context: ...otation within the same model, which is exactly the behavior LongCat detects as "multip...

(EXACTLY_PRECISELY)

[grammar] ~15-~15: Ensure spelling is correct
Context: ...89-98) only clears the sticky key but **keoes preferredKeyId to undefined — but t...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.22.1)

[warning] 11-11: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

[warning] 15-15: Spaces inside code span elements

(MD038, no-space-in-code)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/longcat-session-ban/requirements.md around lines 11 - 15, The Context section has multiple formatting and grammar issues—remove the stray period and fix capitalization in the opening sentence (replace ". and It uses" with "and it uses"), normalize the inline code snippet that currently reads "`{ modelDbId, + optional `keyId` + `lastUsed` }`" to a clear code span like `{ modelDbId, optional keyId, lastUsed }`, delete or merge the orphan fragments ("and eviction." and the standalone "and"), break up and rewrite the run-on paragraph so each idea is a clear sentence referencing the sticky session implementation in server/src/routes/proxy.ts and the longcat-sticky-key spec, and ensure references to functions/files (clearStickyKey(), preferredModel, routeRequest(), shouldSkipModelOnRetry(), isRetryableError()) remain correct and readable.

coderabbitai · 2026-06-02T00:10:30Z

+ | | 8 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1245-1281) | Edit | Update error handling in retry loop to detect multiple key use + ban LongCat |
+| | 9 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1147-1149) | Edit | Add truncated response detection in streaming success path |
+| | 10 | [`server/src/services/router.ts`](server/src/services/router.ts:498-527) | Edit | Skip LongCat boost for banned sessions |
+ | | 11 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` |


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix duplicate row numbering in files table.

The table has two rows numbered as row 11 (lines 27-28). This creates ambiguity in the modification checklist.

📋 Suggested fix

Renumber the second occurrence of row 11 to row 12:

| | 10 | [`server/src/services/router.ts`](server/src/services/router.ts:498-527) | Edit | Skip LongCat boost for banned sessions | - | | 11 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` | + | | 12 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` |

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

| | 8 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1245-1281) | Edit | Update error handling in retry loop to detect multiple key use + ban LongCat |

| | 9 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1147-1149) | Edit | Add truncated response detection in streaming success path |

| | 10 | [`server/src/services/router.ts`](server/src/services/router.ts:498-527) | Edit | Skip LongCat boost for banned sessions |

| | 11 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` |

| | 8 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1245-1281) | Edit | Update error handling in retry loop to detect multiple key use + ban LongCat |

| | 9 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1147-1149) | Edit | Add truncated response detection in streaming success path |

| | 10 | [`server/src/services/router.ts`](server/src/services/router.ts:498-527) | Edit | Skip LongCat boost for banned sessions |

| | 12 | [`server/src/routes/proxy.ts`](server/src/routes/proxy.ts:1036-1053) | Edit | Pass `bannedPlatforms` to `routeRequest()` via `skipModels` |

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.roo/specs/longcat-session-ban/requirements.md around lines 24 - 27, The table in requirements.md has duplicated row number "11"; update the second occurrence (the row listing `server/src/routes/proxy.ts` with range 1036-1053) to "12" so row numbers are unique and sequential; ensure any subsequent rows are renumbered accordingly if needed.

coderabbitai · 2026-06-02T00:10:30Z

+    await request(app, 'POST', '/api/keys', {
+      platform: 'longcat',
+      key: 'lc_cooldown_active_test',
+      label: 'cooldown-active-longcat',
+    });
+    await request(app, 'POST', '/api/keys', {
+      platform: 'groq',
+      key: 'gsk_cooldown_active_test',
+      label: 'cooldown-active-groq',
+    });


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the admin key for these /api/keys setup calls.

These tests seed providers through /api/keys, but they rely on a helper that only attaches auth for /v1/*. That means the new coverage is not exercising the route-gating contract and can mask overlap regressions between admin and unified credentials.

As per coding guidelines, The admin key must gate /api/* routes; the unified API key must gate /v1/* routes — they must never overlap, and using one against the wrong route returns 401.

Also applies to: 909-918, 989-998, 1058-1062, 1123-1127, 1176-1180

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/__tests__/routes/proxy-tools.test.ts` around lines 840 - 849, Update the test POST calls to /api/keys to use the admin credential instead of the unified/v1 helper; specifically, for each request(app, 'POST', '/api/keys', {...}) replace or augment the call so it supplies the admin API key (the header used by admin-gated routes, e.g., Authorization: `Bearer <adminKey>` or the test helper variant that attaches the admin key) so these seeds exercise the /api/* admin gating (affects the calls shown and the other occurrences at the ranges noted); keep the route and payload unchanged but ensure auth is the admin key, not the v1/unified helper.

coderabbitai · 2026-06-02T00:10:30Z

+  it('excludes LongCat from bandit routing for non-sticky sessions during cooldown', async () => {
+    const db = getDb();
+
+    // Set up a sticky session on LongCat for ONE session (within cooldown)
+    const stickyMessages = makeMessages('sticky longcat session during cooldown');
+    const stickyKey = getSessionKey(stickyMessages, 'balanced');
+    const longcatRow = db.prepare('SELECT id FROM models WHERE platform = ? AND enabled = 1').get('longcat') as { id: number } | undefined;
+    expect(longcatRow).toBeDefined();
+    (stickySessionMap as Map<any, any>).set(stickyKey, {
+      modelDbId: longcatRow!.id,
+      lastUsed: Date.now() - 1000, // within cooldown
+    });
+
+    // Add keys for both providers
+    await request(app, 'POST', '/api/keys', {
+      platform: 'longcat',
+      key: 'lc_cooldown_exclusion_test',
+      label: 'cooldown-exclusion-longcat',
+    });
+    await request(app, 'POST', '/api/keys', {
+      platform: 'groq',
+      key: 'gsk_cooldown_exclusion_test',
+      label: 'cooldown-exclusion-groq',
+    });
+
+    // Now send a request from a DIFFERENT session (no sticky LongCat)
+    // This session should NOT be able to route to LongCat because the
+    // other session's cooldown excludes LongCat from the bandit router.
+    // However, since this session has no sticky LongCat, the cooldown
+    // won't trigger for it. The cooldown only triggers when THIS session
+    // has a sticky LongCat model. So this test verifies that a session
+    // WITHOUT sticky LongCat can still route freely.
+    // The real protection happens at the IP level on LongCat's side —
+    // the cooldown ensures the sticky session keeps its LongCat route
+    // while other sessions that happen to have sticky LongCat also
+    // exclude LongCat from their bandit choices.
+    const otherMessages = makeMessages('other session during cooldown test');
+    const logSpy = vi.spyOn(console, 'log');
+    const origFetch = global.fetch;
+    let routedProvider = '';
+
+    vi.spyOn(global, 'fetch').mockImplementation(async (url, init) => {
+      const urlStr = typeof url === 'string' ? url : url.toString();
+      if (urlStr.startsWith('http://127.0.0.1') || urlStr.startsWith('http://localhost')) {
+        return origFetch(url, init);
+      }
+      if (!urlStr.includes('/chat/completions')) return origFetch(url, init);
+
+      if (urlStr.includes('longcat')) routedProvider = 'longcat';
+      else if (urlStr.includes('groq')) routedProvider = 'groq';
+
+      const body = JSON.parse((init as any).body);
+      return {
+        ok: true,
+        json: () => Promise.resolve({
+          id: 'chatcmpl-other-session',
+          object: 'chat.completion',
+          created: 123,
+          model: body.model,
+          choices: [{
+            index: 0,
+            message: { role: 'assistant', content: 'other session test response' },
+            finish_reason: 'stop',
+          }],
+          usage: { prompt_tokens: 4, completion_tokens: 2, total_tokens: 6 },
+        }),
+      } as any;
+    });
+
+    const { status } = await request(app, 'POST', '/v1/chat/completions', {
+      messages: otherMessages,
+    });
+
+    expect(status).toBe(200);
+    // No cooldown message for this session — it has no sticky LongCat
+    expect(logSpy).not.toHaveBeenCalledWith(
+      expect.stringContaining('[Sticky] LongCat cooldown active')
+    );
+  });


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

This test never verifies non-sticky-session exclusion.

The test name says LongCat should be excluded for other sessions during cooldown, but the body explicitly documents the opposite behavior and never asserts routedProvider. As written, it passes whether the second session routes to LongCat or not, so it doesn't protect the PR objective.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/__tests__/routes/proxy-tools.test.ts` around lines 895 - 973, The test never asserts which provider was chosen, so it doesn't verify non-sticky-session exclusion; update the test (the "excludes LongCat..." spec) to assert the mocked routing by checking the routedProvider after the POST — e.g. expect(routedProvider).not.toBe('longcat') or expect(routedProvider).toBe('groq') — and keep the existing session/key setup that uses makeMessages and getSessionKey with stickySessionMap to ensure the otherMessages represent a different session; ensure the mocked fetch spy assigning routedProvider is in scope so the assertion observes the chosen provider.

coderabbitai · 2026-06-02T00:10:31Z

+  // LongCat sticky cooldown: if the sticky model is on LongCat and was used
+  // within the last 3 minutes, exclude LongCat from the bandit router for all
+  // other sessions. The current sticky session keeps its pinned LongCat route.
+  // This prevents LongCat from seeing multiple sessions/keys from the same IP.
+  if (preferredModel) {
+    const db = getDb();
+    const prefRow = db.prepare('SELECT platform FROM models WHERE id = ?').get(preferredModel) as { platform: string } | undefined;
+    if (prefRow?.platform === 'longcat') {
+      const cooldownSessionKey = getSessionKey(normalizedMessages, routingMode);
+      const cooldownEntry = cooldownSessionKey ? stickySessionMap.get(cooldownSessionKey) : undefined;
+      if (cooldownEntry && Date.now() - cooldownEntry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS) {
+        const ageMs = Date.now() - cooldownEntry.lastUsed;
+        addProviderModelsToSkipModels(skipModels, 'longcat');
+        console.log(`[Sticky] LongCat cooldown active — excluding LongCat from bandit routing for other sessions | session=${cooldownSessionKey?.slice(0, 8)} | lastUsed=${ageMs}ms ago`);


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

This cooldown never applies to “other sessions.”

The branch only runs when the current request is already pinned to a LongCat preferredModel. Sessions that are not already sticky to LongCat never enter it, so they never add LongCat to skipModels. That means the advertised cross-session cooldown is effectively a no-op.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/routes/proxy.ts` around lines 1243 - 1256, The cooldown check currently only runs when the current request is already pinned (preferredModel), so it never excludes LongCat for other sessions; instead, scan stickySessionMap for any sticky entry whose platform is 'longcat' and whose lastUsed is within LONGCAT_STICKY_COOLDOWN_MS and whose session key differs from the current session to trigger the exclusion. Concretely, compute the current session key with getSessionKey(normalizedMessages, routingMode), then iterate stickySessionMap.values() (or entries) looking for an entry with entry.platform === 'longcat' and Date.now() - entry.lastUsed < LONGCAT_STICKY_COOLDOWN_MS and entry.sessionKey !== currentSessionKey; if found, call addProviderModelsToSkipModels(skipModels, 'longcat') (and log) — do this outside/independent of preferredModel so other sessions see LongCat excluded.

coderabbitai · 2026-06-02T00:10:31Z

+          {
+            const streamTextToCheck = responseStreamContext ? responseStreamContext.outputText : streamedText;
+            if (isTruncatedResponse(streamTextToCheck)) {
+              if (route.platform === 'longcat') {
+                // LongCat: exclude entire provider immediately on truncation
+                console.warn(`[Proxy] Truncated stream content detected from LongCat — banning LongCat provider for session`);
+                banPlatformFromSession(normalizedMessages, routingMode, 'longcat', route.modelDbId);
+                addProviderModelsToSkipModels(skipModels, 'longcat');
+              } else {
+                // Non-LongCat: skip only this specific model, other models from same provider remain available
+                console.warn(`[Proxy] Truncated stream content detected from ${route.platform} — skipping model ${route.modelId} for session`);
+                skipModels.add(route.modelDbId);
+              }
+            }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t record truncated streams as successful sticky hits.

Once isTruncatedResponse() fires, this route has already been classified as a failure signal, but the code still falls through to recordSuccess() and setStickyModel(). That feeds bad success data back into routing and can re-pin the session to the same model/provider you just excluded.

Possible fix

{ const streamTextToCheck = responseStreamContext ? responseStreamContext.outputText : streamedText; + let wasTruncated = false; if (isTruncatedResponse(streamTextToCheck)) { + wasTruncated = true; if (route.platform === 'longcat') { // LongCat: exclude entire provider immediately on truncation console.warn(`[Proxy] Truncated stream content detected from LongCat — banning LongCat provider for session`); banPlatformFromSession(normalizedMessages, routingMode, 'longcat', route.modelDbId); addProviderModelsToSkipModels(skipModels, 'longcat'); @@ } } + + if (wasTruncated) { + logRequest(route.platform, route.modelId, 'error', estimatedInputTokens, totalOutputTokens, Date.now() - start, ttfbMs, 'truncated response'); + return; + } } @@ - recordSuccess(route.modelDbId); - setStickyModel(normalizedMessages, route.modelDbId, routingMode, route.keyId); - resetAllConsecutiveFailures(normalizedMessages, routingMode); + recordSuccess(route.modelDbId); + setStickyModel(normalizedMessages, route.modelDbId, routingMode, route.keyId); + resetAllConsecutiveFailures(normalizedMessages, routingMode);

Also applies to: 1404-1407

coderabbitai · 2026-06-02T00:10:31Z


  for (const entry of sorted) {
-    if (skipModels?.has(entry.model_db_id)) continue;
+    if (skipModels?.has(entry.model_db_id) && entry.model_db_id !== preferredModelDbId) continue;


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Don't let sticky preference override hard model skips.

Line 539 makes preferredModelDbId bypass skipModels for every skip reason. That breaks the new non-LongCat failure flow: after proxy.ts adds the just-failed model to skipModels on a 5xx/truncation, a sticky session can still route straight back to that same model instead of falling back. Please split this into overrideable vs non-overrideable skips, or clear preferredModelDbId before rerouting on hard failures.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/src/services/router.ts` at line 539, The current check lets preferredModelDbId bypass all skips; fix by splitting skipSets into nonOverrideableSkips vs overrideableSkips and change the routing condition so nonOverrideableSkips always win (e.g., if (nonOverrideableSkips.has(entry.model_db_id)) continue; else if (overrideableSkips.has(entry.model_db_id) && entry.model_db_id !== preferredModelDbId) continue;), and update where skips are added (in proxy.ts) to place hard failures into nonOverrideableSkips so sticky sessions cannot route back to a hard-skipped model.

vi70x3 added 8 commits June 1, 2026 22:00

fix: rm excessive checks

27a9f2f

test: add units

8772ba4

feat(proxy): add LongCat sticky session cooldown safeguard

7dd4189

feat(router): longcat sticky cooldown — exclude LongCat from bandit f…

5f6f0b0

…or other sessions during 3-min cooldown window

refactor(proxy): LongCat immediate provider exclusion, non-LongCat mo…

95722bc

…del-only skip on 5xx

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

coderabbitai Bot suggested changes Jun 2, 2026

View reviewed changes

vi70x3 closed this Jun 2, 2026

	if (skipModels?.has(entry.model_db_id) && entry.model_db_id !== preferredModelDbId) continue;
	if (skipModels?.has(entry.model_db_id)) continue;


		## Architecture Overview

		The ban mechanism extends the existing sticky session infrastructure in `proxy.ts` and integrates with the retry loop in `handleChatCompletion()`. The router (`router.ts`) requires minimal changes — only the LongCat smart-auto preference needs to respect session bans. All ban detection and session management happens in the proxy layer.


		### 2. `banPlatformFromSession()` — [`proxy.ts`](server/src/routes/proxy.ts)

		Records a platform ban in the sticky session. Called when truncation, auth, or rate-limit errors is detected on LongCat.

		When a LongCat sticky session encounters an error — whether it's an auth failure (401/40403) or a "truncated" / "conflict" error from the provider — the system must to:

		1. Detect the error, 2. Ban the entire `longcat` platform for this sticky session, 3. Fall back to the next best non-LongCat model via normal routing, 4. Update the sticky session to point to the new fallback model, 5. Never route this session to LongCat again (until the session expires via TTL)

Conversation

vi70x3 commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

LongCat

Non-LongCat providers

Changes in server/src/routes/proxy.ts

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

🚨 Critical Logic Error in Cooldown Safeguard

Suggested Fix

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

⚠️ Fallback Mechanism Broken for Sticky Sessions

How it breaks:

Solution:

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

🧹 Dead Code: Unused Consecutive Failure Tracking

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vi70x3 commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

Changes in `server/src/routes/proxy.ts`

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading